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Abstract. We present a new efficient algortithm for construction of linear 
latent structure (LLS) models. This algorithm reduces a problem of estimation 
of model parameters to a sequence of problems of linear algebra, which assures 
a low computational complexity and ability to handle on desktop computers 
data that involve up to thousands of variables. 



The class of linear latent structure (LLS) models belongs to a family of latent 
structure models, which, in turn, is a subfamily of a family of mixed distribution 
models. Such models naturally occur when a population of interest is supposed to 
be heterogeneous. 

The most widely used methods for estimation latent structure models are based 
on maximization of the likelihood function. These are well established methods 
possessing many good properties. Nevertheless, they have limitations, which may 
restrict or even prevent their usage. First, the number of parameters to be optimized 
is proportional to the number of variables (measurements), which in practice limits 
the number of variables used in the analysis to several dozens. Second, the likelihood 
function in the case of latent structure analysis is often multimodal, which requires 
usage of additional techniques to ensure that the absolute maximum is found. 

Our algorithm is based on methods of linear algebra, which eliminates the prob- 
lem of multimodality and allows us to analyze up to thousands of variables. The 
time spent by the algorithm is proportional to the cube of the number of variables. 

Historically, the predecessor of LLS analy sis was grade o f memb ership (GoM) 
analys is, which was introduced in Woodbury see also lManton et ahl 

l|l994l) for detailed exposition and additional references. Our work on LLS analysis 
originated from attempts to find conditions for consistency of GoM estimators. The 
development eventually lead to a new class of models, which differ from GoM mod- 
els in a way how the model is formulated, methods of model estimation, meaning 
of estimators and their interpretation. 



1. Basic notions 

LLS analysis considers J discrete measurements, represented by a vector of ran- 
dom variables X = [Xi , . . . , Xj), with the set of outcomes of j"^ measurement (i.e. 
the set of possible values of random variable Xj) being {1, . . . , L^}. We consider 
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a distribution law of this random vector as a mixture of independent distribution 
laws, i.e. distribution laws satisfying 



Representation of the observed distribution law as a mixture of independent 
distribution laws is standard for latent structure analysis (and it is its defining 
characteristic) . 

Due to description of independent distribution law requires only knowing 
P{Xj = Ij). Thus, an independent distribution law may be described by \L\ ~ 
Li + ■ ■ ■ + Lj-dimensional vector (3 — where j ranges from 1 to J, and for 

every j, / ranges from 1 to Lj; [3ji = P(^j = 0- Let be a mixing measure 
producing the observed distribution. 

The main LLS assumption is that for some integer if, /i^ is supported by a 
if-dimensional linear subspace Q of R'^L Later, we refer to this K as to the 
dimensionality of LLS problem. 

This assumption is essentially equivalent to the assumption that there exists a 
if-dimensional random vector G such that for every j a regression of Yj on G is 
linear. Here Yj is an -dimensional random vector, Yj = 1; if Xj — I (where 1/ 
denotes a vector which has l^^ component equal to 1, and all other components 
equal to 0.) Namely, let A = {A^, . . . , A-^}, A'' = {Xji)ji, be any basis of Q, and for 
(3 € Q, let g = {gk)k=i,...,K be its coordinates in basis A. Then the random vector 
G is the random vector (3 (distributed according /i^) written in coordinates g, and 
matrices Aj = {Xji)ki are linear regression matrices. 

The linear regression assumption is crucial for understanding the meaning of 
the LLS model and gives guidelines for its applicability. It essentially means that 
the measurements are not chosen arbitrarily but rather to reflect in some degree 
a hidden property, or a hidden state, represented by the random vector G. LLS 
analysis is about how to discover this hidden state and describe it as precisely as 
possible. 

Let fj,g be a measure fip written in coordinates g. 

Let £ — {£i, . . . ,£j) be an integer vector with < £j < Lj. Such a vector 
represents the outcome of J measurements, and £j = means that we do not take 
into account the outcome of the j^^ measurement. Thus, a value of £j = in a 
vector £ means that the vector is a marginal vector across all values of the j*** 
measurement. Let be a set of all such vectors, and for every 9 C {1, . . . , J} let 
£[^1 be a set of vectors having O's exactly on places from S. Let v = {vi, . . . , vk) 
be an integer vector with Vk > 0, and for every integer J' > let V[J'] be a set of 
such vectors satisfying the additional condition Vk = J' . 

In this language, the values of interest are unconditional moments of the distri- 
bution /i^ 



(1) 



(2) 
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and conditional moments of distribution fig, 



(3) EiG" n.9r^™^^^ M.(rfff) 

The unconditional moments Me{fi/3) are the probabilities of obtaining the re- 
sponse pattern £ (under assumptions of the model.) Thus, frequencies of response 
patterns £ in a, sample, denoted fg, are consistent and efficient estimators for un- 
conditional moments Milpfj). 

The conditional moments E(G" \ X = £) express our knowledge of the state 
of the individual (represented by random vector G) based on the outcomes of the 
measurements. These values are not directly estimable from the observations. The 
goal of LLS analysis is to obtain estimates for these conditional moments. 

The most important relation connecting unconditional moments, conditional mo- 
ments and the basis A (in which conditional moments are calculated) is: 



(4) Yl • ■ \X = £))= Me+i^ (m^) • E(G" \X=£ + l, 



2. The main system of equations 
We have shown in lKovtun et al.l ((iflQ^ that the LLS model defined above is fully 

k 
3 1 



described by a system of equations (with respect to variables and h^) 



(5) 



(Ek^jiK^'' = K+i,, J' e [0..J- 1], V e V[J'], 

3 C [1..J] : |3| > J', £e £'^1, 
je3, le [1..L,] 

hf-'°'> = Me, £ e 



In this system, the first group of equations corresponds to the main relation between 
moments (@J, and the last two equations are normalization conditions. 
We have proven the following properties of the main system: 

(1) Any basis A of Q together with conditional moments E{G^ \ X — £) calcu- 
lated in this basis give a solution of © (A*^; should be substituted for a^i, 
and MeXfifs) ■ E(G" \ X = £) should be substituted for /i^.) 

(2) Under mild conditions, every solution of jSJ gives a basis of Q and condi- 
tional moments calculated in this basis. 

As the main system of equations fully describes the model, the important prop- 
erty of the LLS analysis follows: the mixing distribution is not fully identifiable. 
Only a finite number of moments may be found by solving the system, and any 
mixing distribution that have these moments would satisfy the system. The fact 
of nonidentifiability also follows from the general theorem about identifiability of 
mixtures, because the family of distributions contained in Q is not linearly inde- 
pendent. 

The attractive feature of the LLS analysis is that it can discover a number of 
useful invariants of the mixture. The supporting plane of the mixing distribution is 



4 



M.KOVTUN, I.AKUSHEVICH, K.G.MANTON, AND H.D.TOLLEY 



defined uniquely, and low-order moments are identifiable as well. This information 
is sufficient to make practically substantial conclusions about the population under 
consideration. 

The main system of equations provides a means for consistent estimation of 
model parameters. The solution of this system continuously depends on uncondi- 
tional moments M^; thus, substitution of frequencies fi for moments Mi gives a 
system, which solutions converge to the true values of parameters when frequencies 
converge to the true moments. 

One good property of the main system of equations is that it is linear with 
respect to variable hj. Thus, if the supporting plane of distribution is known, the 
conditional moments ((SJ may be obtained by solving a linear system of equations. 
It happens that the supporting plane may be found independently by analysis of 
the moment matrix, which we describe in the next subsection. 

3. The moment matrix 

Let us write a vector of moments {Mi.)ji together with incomplete vectors of 
moments (Mii^^i.)ji.j^ji , etc., as columns of a matrix, with places for which we 
do not have moments filled by question marks. We refer to this incomplete matrix 
as the moment matrix. The moment matrix contains a column for every I E fiP . 
Figure n gives an example of a portion of a moment matrix for the case J = 3, 
Li = L2 = L3 = 2. Columns in this matrix correspond to ^ = (000), (100), (200), 
(010), (020), (001), (002), (110); other columns arc not shown. 
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Figure 1 . Example of moment matrix 

Note that certain moments (which are replaced by question marks in the moment 
matrix) are not observable. The reason for this is that we are not able to perform a 
measurement on an individual multiple times independently, and since individuals 
are heterogeneous (have different probabilities of outcomes of measurements), we 
do not have multiple realizations of independent identically distributed random 
variables. 

For a moment matrix M let its completion M be a matrix obtained from M by 
replacing question marks by arbitrary numbers. We have shown that the moment 
matrix always has a completion in which all columns belong to the supporting 
plane Q. Thus, if the moment matrix has sufficient rank (which is the case in 
non-degenerate situations,) a basis of Q may be obtained from this matrix. As we 
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have a consistent estimator of the moment matrix in form of a frequency matrix, 
the supporting plane may be consistently estimated. 

This property of the moment matrix suggests an efficient algorithm to obtain 
LLS estimates. First, a basis of the supporting plane can be obtained from the 
moment matrix (a way to do this is described in the next section), and second, 
conditional moments can be found by solving a linear system of equations. 

4. Algorithm 

As it is suggested by a structure of the main system of equations (jSj) and by 
properties of the moment matrix, the algorithm is naturally decomposed into two 
parts. On the first step, a basis of the supporting plane should be constructed; the 
input for this step is the frequency matrix. On the second step, a system of linear 
equations should be solved to obtain estimates for conditional expectations. 
Step 1: Finding the supporting plane. As for the model distribution all columns of 
the moment matrix belong to the supporting plane, and as the frequency matrix 
is an approximation of the moment matrix, the natural way to search for the sup- 
porting plane is to search for a plane that minimizes the sum of distances from it to 
the columns of the frequency matrix. In our case, however, this way is complicated 
by at least three obstacles: (a) a sought basis A should exactly satisiy conditions 
^ji ~ ^ for every k and j; (b) the statistical inaccuracy of approximation of mo- 
ments Mi by frequencies fi varies considerably over elements of frequency matrix; 
(c) the moment matrix (and, correspondingly, the frequency matrix) is incomplete. 

The suggested algorithm for estimating the supporting plain consists of the fol- 
lowing steps. 

(i) The computational rank of the frequency matrix is estimated. For this, 
we take the biggest minor of the frequency matrix that does not contain 
question marks. (For the example given in Figure ^ it is the left bottom 
minor of size 3x3.) Then we calculate the singular value decomposition 
(SVD) and take Kq (the first approximation of dimensionality of the LLS 
problem) equal to the number of singular values that are greater than stan- 
dard deviation of the norm of columns involved in the minor. (The final 
value for dimensionality of LLS problem will be chosen on the step (v).) 
As one of requirements for applicability of LLS model is K <^ \L\, nothing 
can be done further if all (or too many) singular values are greater than 
the standard deviations. 

(ii) We construct a completion of the frequency matrix by means of the follow- 
ing procedure. For every column c of the frequency matrix and row jl of 
a question mark in c, we select Kq columns c^^\ . . . satisfying: (a) 
all columns c*-*-* contain a value (not a question mark) in row jl; (b) there 
exist p > Kq rows such that all columns c, c^^\ . . . , c^^°^ contain values in 
these rows. Let c[p\ be a subcolumn containing only selected rows. Then 
we solve a linear system Q!ic(^^[p] + • • • + aic^-^°^[p] — c[p] and replace a 
question mark at the position Cji by aicjj'' + • • • + aiCjf°\ The system to 
be solved is overdetermined; we solve it by minimization of residuals using 
SVD. When Kq is sufficiently smaller than |L|, the required selection of the 
columns is possible for every column c which contains at least Kq values; 
columns containing le ss than Kq va lues are discarded from further consid- 
eration. According to lKovtun et alJ |2005,'l . the moment matrix always has 
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a completion of rank equal to the dimensionality of LLS problem; thus, this 
method should give good results. 

(iii) Columns are normalized, so that the condition c'^i = 1 holds for every 
j. This is always possible, as for every column c we have Cji — s, where 
s does not depend on j. Thus, we take c' — -jc. 

(iv) Next, we remove the restriction Cji = 1 by reducing number of rows by 
J (1 for every group of indices j\, . . . ,jLj). For this, we use a linear map 
from RI^I to RI^I-"' given by a block-diagonal matrix A with J blocks 



1 ... o\ 



V-^ ... V 



(6) A, = 

of size Lj X (Lj — l). This map is an isometry of the subspace of M'^' defined 
by equations cji = 1 to kI^I^"^ (every block Aj defines a rotation of a 
unit simplex in ij-dimensional space around hypersurface opposite to the 
first vertex; the angle of this rotation is such that the first vertex moves to 
the point with the first coordinate equals 0). 
(v) Now we have n points y^, . . . ,y" (images of columns of frequency matrix) in 
TO = \L\ — J-dimensional space. The problem is to find an affine plane that 
minimally deviates from these points. First, we find the center of gravity 
of this system 



y 



and then consider a new set of points — — . We need to find 
a linear subspace in R" that minimally deviates from this set of points. 
T he solution of this problem is well-known (see, for example, chapter 43 
of iKendall and Stuard lIlQ??!) ): one has to consider an to x to matrix X = 
{Xrs)rs with components Xrs = -^r ' -^s' ^^i^ matrix is symmetrical and 
positively defined, and thus it possess an orthonormal basis of eigenvectors. 
Let 7i > 72 > ■ ■ • > 7m > be eigenvalues of matrix X , and let z^, . . . , z™ 
be corresponding them eigenvectors. The plane of dimensionality p that 
minimizes the sum of sc^uared distances from point x^, . . . , is spanned by 
z^, . . . , z^, and the sum of squared distances is ^"^^X ~YTk=i 7fe' This gives us 
a criterion for the selection of the dimensionality K of the LLS problem: one 
has to take K to be the smallest integer such that eigenvalues ^k, ■ ■ ■ , 7m 
are smaller that inaccuracy in input data. Vectors y^ ,y^ + , . . . , y° + z^^^ 
give us an afhne basis of the sought affine plane, 
(vi) Lastly, we apply inverses of transformation © to y'^ ,y'~' + z^ , . . . ,y^ + z^^^ 
to obtain the sought basis A""^, . . . , of the subspace Q. 

The above algorithm solves the problems (a)-(c) listed in the beginning of the 
subsection, and it possesses two important properties that are crucial to its useful- 
ness: (a) if the input of the algorithm are true moments of a distribution generated 
by if-dimensional LLS model, the output of the algorithm is the true supporting 
plane; (b) there exists an open neighbourhood of the true moment matrix in which 
the output of the algorithm continuously depend on its input. 
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The preliminary experiments with the prototype of the algorithm performed 
by the applicants demonstrated that it restores the supporting plane with a good 
degree of precision. 

Step 2: Calculation of conditional expectations. When a basis of the supporting 
plane is found, the conditional expectations can be found from the main system of 
equations which is a linear system after substituting the basis. This is a sparse 
overdetermi ned system; methods fo r solving such systems are well-elaborated — see, 
for example. iForsvthe et al 1 l|l977|) : iKahaner et all l|l988|) . 
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